Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Nov 26, 2021

Which issue does this PR close?

Closes #1367.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@viirya
Copy link
Member Author

viirya commented Nov 26, 2021

cc @Dandandan

Projection: #test.a\
\n Filter: NOT #test.c AS test.c = Boolean(false)\
\n Filter: #test.b AS test.b = Boolean(true)\
\n Filter: NOT #test.c\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Dandandan
Copy link
Contributor

thanks @viirya for the quick change / PR! I think this is a good change.

I am wondering if there's more places where we don't want to add an alias.

Also we probably need some more tests to cover filter pushdown in combination with other optimization passes like this one.

Another thing to consider might be for the filter operation to ignore the alias.

FYI @alamb WDYT

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how aliases can mess up filter push down (perhaps filter pushdown should be stripping off the aliases)?

} else {
Ok(new_e)
match plan {
LogicalPlan::Filter { .. } => Ok(new_e),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to recommend adding a comment here about why we don't add aliases for filter.

@alamb
Copy link
Contributor

alamb commented Nov 27, 2021

Yeah, I suggest:

  1. Adding a test with the TPCH explain plans (so we can detect such errors earlier in the future)
  2. Looking into what is going on with Filter pushdown that aliases disabled them.

I "feel" like there should be nothing wrong with aliasing the output of a Filter

@viirya
Copy link
Member Author

viirya commented Nov 28, 2021

Thanks. I'm adding a test for TPCH plans. I will take a look on Filter pushdown with aliases later.

@viirya
Copy link
Member Author

viirya commented Nov 28, 2021

Currently the filter_push_down rule fails on TPCH q10 with aliased on Filter because split_members method doesn't address Alias so a predicate like #orders.o_orderdate >= Date32("8674") AND #orders.o_orderdate < Date32("8766") AND #lineitem.l_returnflag = Utf8("R") AS ... will not be split so it won't be pushed down.

By adding Alias into split_members, the rule can push down aliased predicate.

@viirya viirya changed the title Skip adding aliases for Filter split_members should be able to split aliased predicate Nov 28, 2021
}

#[tokio::test]
async fn tpch_explain_q10() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍
Maybe it would be good to have those in the tpch crate instead?
In that case we could include some / all other queries as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- filed #1377 to track this

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @viirya !

Would it be possible to move the tpch tests into the tpch crate, as suggested by @Dandandan as part of this PR?

Someone can add explain plans for the other TPCH queries as part of #1377 so I don't think that is necessary.

Thanks for sticking with this one and finding the elegant solution

split_members(left, predicates);
split_members(right, predicates);
}
Expr::Alias(expr, _) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 excellent

}

#[tokio::test]
async fn tpch_explain_q10() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- filed #1377 to track this

\n Join: #customer.c_custkey = #orders.o_custkey\
\n TableScan: customer projection=Some([0, 1, 2, 3, 4, 5, 7])\
\n Filter: #orders.o_orderdate >= Date32(\"8674\") AND #orders.o_orderdate < Date32(\"8766\")\
\n TableScan: orders projection=Some([0, 1, 4]), filters=[#orders.o_orderdate >= Date32(\"8674\"), #orders.o_orderdate < Date32(\"8766\")]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so lovely to see those filters pushed down ❤️

@alamb alamb changed the title split_members should be able to split aliased predicate Fix Predicate Pushdown: split_members should be able to split aliased predicate Nov 28, 2021
@viirya
Copy link
Member Author

viirya commented Nov 28, 2021

Thanks @alamb @Dandandan ! Yeah, if I find some time later to (or someone can) add tests for the other TPCH queries, we can move the test together.

@Dandandan Dandandan self-requested a review November 28, 2021 18:39
@Dandandan Dandandan merged commit 15db291 into apache:master Nov 28, 2021
@houqp houqp added the bug Something isn't working label Nov 29, 2021
@houqp houqp added the performance Make DataFusion faster label Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TPC-H q10 performance regression (expression for filter with added alias is not pushed down)

4 participants